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ABSTRACT 

Cascades are ubiquitous in various network environments. 
How to predict these cascades is highly nontrivial in several 
vital applications, such as viral marketing, epidemic preven¬ 
tion and traffic management. Most previous works mainly 
focus on predicting the final cascade sizes. As cascades are 
typical dynamic processes, it is always interesting and im¬ 
portant to predict the cascade size at any time, or predict 
the time when a cascade will reach a certain size (e.g. an 
threshold for outbreak). In this paper, we unify all these 
tasks into a fundamental problem: cascading process predic¬ 
tion. That is, given the early stage of a cascade, how to pre¬ 
dict its cumulative cascade size of any later time? For such 
a challenging problem, how to understand the micro mech¬ 
anism that drives and generates the macro phenomenons 
(i.e. cascading proceese) is essential. Here we introduce be¬ 
havioral dynamics as the micro mechanism to describe the 
dynamic process of a node’s neighbors get infected by a cas¬ 
cade after this node get infected (i.e. one-hop subcascades). 
Through data-driven analysis, we find out the common prin¬ 
ciples and patterns lying in behavioral dynamics and propose 
a novel Networked Weibull Regression model for behavioral 
dynamics modeling. After that we propose a novel method 
for predicting cascading processes by effectively aggregat¬ 
ing behavioral dynamics, and propose a scalable solution to 
approximate the cascading process with a theoretical guar¬ 
antee. We extensively evaluate the proposed method on a 
large scale social network dataset. The results demonstrate 
that the proposed method can significantly outperform other 
state-of-the-art baselines in multiple tasks including cascade 
size prediction, outbreak time prediction and cascading pro¬ 
cess prediction. 
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1. INTRODUCTION 

In a network environment, if decentralized nodes act on 
the basis of how their neighbors act at earlier time, these 
local actions often lead to interesting macro dynamics - cas¬ 
cades. In online social networks, the information a user 
can get and engage in is highly dependent on what his/her 
friends share, and thus information cascades naturally occur 
and become the major mechanism for information commu¬ 
nication. There has been a growing body of research on 
these information cascades because of their big potential in 
various vital applications such as viral marketing, epidemic 
prevention, and traffic management. Most of them focus on 
characterizing these information cascades and discovering 
their patterns in structures, contents and temporal dynam¬ 
ics. 

Recently, predictive modeling on information cascades has 
aroused considerable research interests. Earlier works on 
predicting the final size of information cascades based on 
content, behavioral and structural features [^ [^. As only 
large cascades are of interest in most real applications, Cui et 
al. propose a data driven approach to predicting whether 
the final size will surpass a threshold for outbreak. More 
recently, Cheng et al.[^ go beyond the final size to contin¬ 
uously predict whether the cascade will double the current 
size in future. They also raise an interesting question that 
whether cascades can be predicted, and their experimental 
results demonstrate that cascade size are highly predictable. 
However, the previous works were all about cascade size, 
which did not include the whole of information cascades. 
Information cascade is a typical dynamic process, and tem¬ 
poral scale is critical for understanding the cascading mech¬ 
anism. Also, it is highly nontrivial to predict when a cascade 
breaks out, and, more ambitiously, to predict the evolving 
process of a cascade (i.e. cascading process, as shown in[^ 
(a)). In this paper, we move one step forward to ask: Is 
the cascading process predictable? That is, given the early 
stage of an information cascade, can we predict its cumula¬ 
tive cascade size of any later time? 

It is apparent that the targeted problem is far more chal¬ 
lenging than those in previous works. The commonly used 
cascade-level macro features for size prediction, such as the 
content, increasing speed and structures in the early stage 
are not distinctive and predictive enough for the cascade 
sizes at any later time. A fundamental way to address this 
problem is to look into the micro mechanism of cascading 
processes. Intuitively, an information cascading process can 
be decomposed into multiple local (one-hop) subcascades. 



(a) Cascading Process 



(b) Partially observed 
cascade at t 



of Pi 


Figure 1: Illustration of cascading process prediction 


When a node involves in a cascade, one or more of its off¬ 
spring nodes will also involve in the cascade with a tem¬ 
poral scaling. If the dynamic process of these subcasades 
can be accurately modeled, then the cascade process can be 
straightforwardly predicted by an additive function of these 
local subcascades. 

Here we exploit behavioral dynamics as the micro mech¬ 
anism to represent the above mentioned dynamic process 
of local subcascades. Given a node involving in a cascade 
at to, its behavioral dynamic aims at capturing the chang¬ 
ing process of the cumulative number of its offspring nodes 
that involve in the cascade with time evolving. By dehni- 
tion, this is a non-decreasing counting process and can be 
well represented by survival model [^. A paucity of re¬ 
cent research works have exploited the survival theory to 
model how the occurrence of event at a node affects the 
time for its occurrence at other nodes (i.e. diffusion rate), 
and their results demonstrate the superiority of continuous¬ 
time survival model to uncover temporal processes. How¬ 
ever, their targeted problem is to uncover the hidden dif¬ 
fusion networks, and thus suppose the parameters of the 
survival function on each edge to be fixed. This will cause 
the unexpected result that all the cascades with the same 
root node (or early involved nodes) will be anticipated to 
have the same cascading processes, which makes these mod¬ 
els inapplicable in our problem. 

In this paper, we propose a novel method for cascading 
process prediction, as shown in Figure Given the early 
stage of a cascading process before t in Figure (a), we il¬ 
lustrate the partially observed cascade as shown in Figure^ 
(b), where nodes in green (red) represent the observed (un¬ 
observed) nodes involved before (after) t. Given the behav¬ 
ioral dynamics of node pi represented by its survival rates, 
and the number of its offspring nodes that have involved be¬ 
fore t, we can predict the cumulative number of its offspring 
nodes that involve in the cascade at any time t' > t. Af¬ 
ter conducting similar predictions on all the observed nodes, 
the cascading process after t can be predicted by an additive 
function over all local predictions from behavioral dynamics. 

More specifically, how to model behavioral dynamics and 
further predict cascading process based on continuous-time 
survival theory also entail many challenges. First, it is un¬ 
clear what distribution form the behavioral dynamics fol¬ 
low. Although Exponential and Rayleigh distributions are 
commonly used to characterize the temporal scaling of pair¬ 
wise interactions, behavioral dynamics in this paper are a 
reflection of collective behaviors and are proved to be incon¬ 
sistent with these simple distributions in real data. Second, 


the parameters in survival models are difhcult to interpret, 
which limits the generality of the learned model. Given the 
distribution form of data, the parameters of survival model 
can always be learned from real data in maximum likelihood 
manner. However, it is unsure what these parameters stands 
for and the learned model cannot be generalized to out-of- 
sample nodes (i.e. the nodes whose behavioral dynamic data 
is not included in the data). Third, the predictive models 
based on survival theory are computationally expensive due 
to the continuous-time characteristic, which makes them in¬ 
feasible in real applications. Thus, we intend to design an 
effective and interpretable model for behavioral dynamics 
modeling and a scalable solution for cascading process pre¬ 
diction. 

In particular, we conduct extensive statistical analysis on 
large scale real data and find that the behavioral dynamics 
cannot be well captured by simple distributions such as Ex¬ 
ponential and Rayleigh distribution, but the general form 
of Exponential and Rayleigh, Weibull distribution, can well 
preserve the characteristics of behavioral dynamics. Also, 
we discover strong correlations between the parameters of 
a node’s behavioral dynamics and its neighbor nodes be¬ 
havioral features. Enlightened by these, we propose a NEt- 
worked WEibull Regression (NEWER) model for parameter 
learning of behavioral dynamics. In addition to the maxi¬ 
mum likelihood estimation term, we also assume the param¬ 
eters of a node can be regressed by the behavioral features 
of its neighbor nodes and thus impose networked regular- 
izers to improve the interpretability and generality of the 
model. Based on the behavioral dynamics, we further pro¬ 
pose an additive model for cascading process prediction. To 
make it scalable, we propose an efficient sampling strategy 
for approximation with a theoretical guarantee. 

We extensively evaluate the proposed method in a com¬ 
plete dataset from a population-level social network in Ghina, 
including over 320 million users, 1.2 billion edges and 340 
million cascades . In all the testing scenarios, the proposed 
method can significantly outperform other baseline meth¬ 
ods. Eigurej^is a showcase of cascading process prediction 
by the proposed method. We show that by accurately mod¬ 
eling behavioral dynamics of social network users, we can 
predict the cascading process with a 2 hours leading time 
window, and get the average precision of 0.97 if we restrict 
the error rate of the size to be 0.1. Also, the accurate pre¬ 
dictions of final cascade size, cascade outbreaking time are 
all implied in the predicted cascading process. 

The main contributions of this paper are: 

(1) Enlightened by the cascading size prediction works, we 
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Figure 2: Showcase of cascading process predic¬ 
tion for a real cascade. The red line represents the 
groundtruth cascading process. The others are pre¬ 
diction results based on different early stage infor¬ 
mation. 

move one step forward to attempt cascading process predic¬ 
tion problem, which implies several vital problems such as 
cascade size prediction, outbreaking time prediction as well 
as evolving process prediction. 

(2) We hnd out the common principles and patterns ly¬ 
ing in behavioral dynamics and propose a novel Networked 
Weibull Regression model for behavioral dynamics modeling 
accordingly, which significantly improves the interpretability 
and generality of traditional survival models. 

(3) We propose a novel method for predicting macro cas¬ 
cading process by aggregating micro behavioral dynamics, 
and propose a scalable solution to approximate the cascad¬ 
ing process with a theoretical guarantee. 


2. RELATED WORK 

Prediction on Cascades. In recent years, many methods 
have been proposed to make prediction on cascades. Most 
of them focus on predicting the future size of a cascade, and 
the common way is to select vital nodes and place sensors on 
them. For example, Cohen et al. focus on exploring the 
topological characteristics of the cascade. Cui et al. pro¬ 
poses to optimize the size prediction problem using dynamic 
information. Cheng et al. introduces temporal feature 
into the problem and they predict the growing size of the 
cascade. Rather than attempt to predict the cascade size, 
we focus on predicting the cascading process which considers 
both time and volume information together. 

Survival Model. Survival model is a method try to anal¬ 
ysis things according to the time duration until one or more 
events happen. In recent years, researchers started model¬ 
ing information diffusion using continuous models. Myers 
et al. proposed CONNIE to infer the diffusion network 
base on convex programming while leaving the transmission 
rate to be fixed, later on Rodriguez et al. 17 proposed NE- 


TRATE which allowing the transmission rate to be different 
in different edges. Subsequently, Rodriguez et al. give 
an additive model and a multiplicative model to describe 
information propagation base on survival theory. Most of 
these works focus on discovering the rules and patterns to 
the edges in the social network and is hard to extend to 
make predictions for cascades since the correlation between 
transmission rates on edges is little. In contrast, our work 
focus more on predictive modeling by grouping correlated 
edges together so that we can make predictions for edges 


base on the information of other edges. 

Influence Modeling and Maximization. Influence mod¬ 
eling and maximization aims to evaluate users’ importance 
in social networks. This is first proposed by Domingos et al. 

to select early starters to trigger a large cascade. Then 
Kempe et al. proposed Stochastic Cascade Model to for¬ 
malize the problem and Chen et al. proposed a scalable 
solutions. Recently the approach was extended to adding 
opinion effect or time decay effects on the models. 
Our work is distinct from existing works in the following 
way: Rather than quantify the influence on nodes, we will 
predict the cascading process. 

3. PRELIMINARIES 

This section presents the dataset information, discovered 
patterns and validated hypothesises to support the model 
design and solution. 

3.1 Dataset Description 

The dataset in this paper is from Tencent Weibo, one of 
the largest Twitter-style websites in China. We collect all 
the cascades in 10 days generated between Nov 15th and 
Nov 25th in 2011. The dataset contains in total 320 mil¬ 
lion users with their social relations, 340 million cascade^ 
with their explicit cascading processes. The distribution of 
cascade size is shown in Eigurej^ We can see that the cas¬ 
cade size follows Power-Law distribution, and the majority 
of cascades have very small size, which are not of interest for 
many applications. As the paper intends to predict cascad¬ 
ing process, we filter out the cascades with the size of less 
than 5, and maintain the remaining 0.59 million cascades 
with obvious cascading process for statistical analysis and 
experiments. 


Distribution of Cascade Size 



Figure 3: Distribution of cascade size. The red 
straight line is the linear fitting result to the blue 
curve, showing the size distribution fits power-law. 


3.2 Characteristics of Behavioral Dynamics 

As mentioned before, behavioral dynamics play a central 
role in uncovering and predicting cascade processes. Here we 
investigate the characteristics of behavioral dynamics to en¬ 
lighten the modeling of behavioral dynamics. By definition, 
the behavioral dynamics of a user capture the changing pro¬ 
cess of the cumulative number of his/her followers retweet 

^Here the cascades are information cascades. When a user 
retweet/generate a post, several of his/her followers will fur¬ 
ther retweet the post and so on so forth to form a information 
cascade. 













model 

density function 

survival function 

hazard function 

ks-static in Weibo 

Exponential 



Ai 

0.2741 

Power Law 

Oil /t\—CKi —1 

(!)-;; 

CXi 

t 

0.9893 

Rayleigh 


e 2 

ait 

0.7842 

Weibull 

ft 


1 / \ki-l 

AT (^) 

0.0738 


Table 1: Parametric Models 


a post after the user retweeting the post. Then the behav¬ 
ioral dynamics of a user can be straightforwardly represented 
by averaging the size growth curve of all subcascades that 
spread to the user and his/her followers. However, Figure]^ 
shows that the size growth curves vary signihcantly for dif¬ 
ferent subcascades of the same user, which means that such 
a representation is not ht to characterize behavioral dynam¬ 
ics. Here we normalize the size growth process by the cas¬ 
cade hnal size and adopt survival function to describe the 
behavioral dynamics where the survival rate represents the 
percentage of nodes that has not been but will be infected. 
As shown in Figure ^ a user’s survival function is quite 
stable for different subcascades although their size growth 
patterns vary. 



Figure 4: The size growth curves and their corre¬ 
sponding survival function for 3 users. 


Then can we use the behavioral dynamics represented by 
survival function to predict the size growth curve of a sub¬ 
cascade? We provide positive answer with the assistance 
of early stage information. For example, if we know the 
subcascade size at an early time to, then the survival func¬ 
tion can be straightforwardly transformed from percentage 
dimension into size dimension. 


3.3 Parametrize Behavioral Dynamics 

For the ease of computation and modeling, we need to 
parametrize the behavioral dynamics in our case. In state- 
of-the-art, Exponential and Rayleigh distributions are often 
used to describe the dynamics of user behaviors in different 
settings 11 . Here we testify these distribution hypothesis 
on our real data and find that these distributions cannot well 
capture both the shape and scale characteristics of behav¬ 
ioral dynamics. Thus, we turn to the general form of Expo¬ 
nential and Rayleigh distributions, the Weibull distribution 
[15] , and hnd it adequate for parametrizing behavioral dy¬ 
namics. In order to quantify the effect of parametrization, 
we calculate KS-Statistic for the three candidate distribu¬ 
tions as shown in Table It displays that Weibull distribu¬ 
tion performs much better than Exponential and Rayleigh 
distribution. The improvement is attributed to the high 
degree of freedom of Weibull distribution as it has two pa¬ 
rameters A and k to respectively control the scale and shape 
of the behavioral dynamics. 


3.4 Covariates of Behavioral Dynamics 

If subcascades for all users are sufficient, the parameters 
of behavioral dynamics can be directly learned from data. 


Behavioral features 

inflow-vate 

the number of the posts user re¬ 
ceived in a certain period. 

outflowjrate 

the number of the posts user sent 
in a certain period. 

follower^avg— 

Anflowjrate 

average inflow rate of fans to the 

y^- retweet(i)-in-flow(i) 

user, or ^ - — 

’ retweetyi) 

where i is the fans to the user(and 
the same as following). 

follower-avg— 

jretweetjrate 

average retweet rate of fans to the 

y^- retweet(i)-retweet-rate(i) 

user, or ^^ 

’ 2^,- retweet{i) 

Structural features 

follow erjnumber 

number of the followers to the user. 

followjnumber 

number of users this user follows. 


Table 2: Behavioral features for users. 


However this suffers from several drawbacks: (1) some users 
may have no or very sparse subcascade in training dataset, 
which makes these users’ behavioral dynamics inaccurate 
or even unknown; (2) it is difficult to interpret the param¬ 
eters directly learned from data, which prohibits us from 
getting insightful understanding on the behavioral dynam¬ 
ics. To address these, we investigate the covariates of be¬ 
havioral dynamics here. As the behavioral dynamics of a 
user are to capture the collective responses of his/her fol¬ 
lowers, we assume the parameters of the user’s behavioral 
dynamics should be correlated with the behavioral features 
of his/her followers (network neighbors). Hence, we ext ract 
a set of behavioral features for each user as listed in Table [ 2 F] 
Eor each user with enough subcascades in our dataset, we 
learned their A and k directly from data. And then, we cal¬ 
culate the correlations between the learned parameters and 
their followers’ collective behavioral features. The examples 
given in Eigurej^ indicate obvious correlations between the 
learned parameters with these behavioral features. There¬ 
fore, we can use these behavioral features as covariates to 
regress the parameters of behavioral dynamics. 

3.5 From Behavioral Dynamics to Cascades 

After validating that the behavioral dynamics can poten¬ 
tially be accurately modeled and predicted, the key problem 
is whether we can derive the macro cascading process from 
micro behavioral dynamics. Intuitively, the cascading pro¬ 
cess cannot be perfectly predicted at early stage by behav¬ 
ioral dynamics. Given any time t, we can only use the behav¬ 
ioral dynamics of the users that involved before t to predict 
the cascading process after t. Consequently, the prediction 
coverage is restricted to all the followers of these users, while 
the users beyond this scope are neglected. These uncovered 
users may potentially affect the performance of cascading 
process prediction. 


^We think that follower with different retweet number 
will have different effects to the user, so we modify the 
weights on each term of f oil ower_avg_in flower ate and 
follower_avgjretweetjrate. 
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Figure 5: Correlations between the survival function 
parameters and the behavioral features 

Fortunately, we observe two interesting phenomenons in 
real data. 

Minor dominance. Although each user has behavioral 
dynamics, the behavioral dynamics of different users make 
significantly different contributions to the cascading process. 
It is intuitive that the behavioral dynamics of an active user 
with 1 million followers contribute much more than that of 
an inactive user with 5 followers. The data also coincides 
with our intuition. According to Figure]^ (a), it can be 
observed that a very small number of nodes whose behav¬ 
ioral dynamics dominate the cascading process underpin the 
idea of just using the behavioral dynamics of these dominant 
nodes for cascading process prediction. 

Early stage dominance. Enlightened by the minor 
dominance phenomenon, we further ask whether the domi¬ 
nant nodes are prone to join cascades in early stage. Here, 
Figure [^(b) depicts the time distribution of these dominant 
nodes joining in cascades. 



Size percentage covered by 
the most dominant node 


(b) 



The time percentage that the 
dominant nodes join cascades 


Figure 6: Minor dominance and early stage domi¬ 
nance in information cascades. 

Taking these two phenomena into account together, it is 
safe to design a model exploiting the behavioral dynamics 
of infected nodes in early stage to predict the cascading pro¬ 
cess. 


4. METHODOLOGY 

This section introduces the NEtworked WEibull Regres¬ 
sion (NEWER) and cascade prediction methods in detail. 

4.1 Problem Statement 

Given a network G = {U,A), where f/ is a collection of 
nodes and A is the set of pairwise directed/undirected re¬ 
lationships. An event (e.g., tweet) can be originated from 


one node and spread (e.g., by retweeting) to its neighbor¬ 
ing nodes. A cascade is typically formed by repeating this 
process. Therefore, a cascade can be represented by a set of 
nodes C = {ui,U 2 , ...Um}, where ui is the root node. In a 
cascade, each node will get infected by the event only once, 
so it is tree-structured. Eor every node Ui in the cascade, we 
denote its parent node as rp{ui). The time stamp that Ui 
gets infected is t{ui), and t(ui) < Then the partial 

cascade before time t is denoted by Ct = {ui\t{ui) < t}, and 
its size size{Ct) = \Ct\ where |.| is the cardinality of a set. 
Then the cascade prediction problem can be defined as: 

Cascade Prediction: Given the early stage of a cascade 
Ct, predict the cascade size size{Ct') with t' > t. 

4.2 Survival Analysis 

Survival analysis is a branch of statistics that deals with 
analysis of time duration until one or more events happen, 
such as death in biological organisms and failure in mechan¬ 
ical systems . It is a useful technique for cascade predic¬ 
tion. More concretely, let tq be a non-negative continuous 
random variable representing the waiting time until the oc¬ 
currence of an event with probability density funtion /(t), 
the survival function 


/ oo 

f{t) (1) 

encodes the probability that the event occurs after t, the 
hazard rate is defined as the event rate at time t condi¬ 
tional on survival until time t or later (tq > t), i.e.. 


\m - lim < To < i + dt\T0 >t) _ f{t) 

dt - W) 


( 2 ) 


S{t) and A(t) are the two core quantities in survival anal¬ 
ysis. 


4.3 NEtworked WEibull Regression Model 

The Weibull distribution is commonly used in survival 
analysis.In network scenario, if we think the time that an 
event (e.g., retweet) happened on a node as a survival pro¬ 
cess, we can fit a Weibull distribution to the survival time of 
node i, then its corresponding density, survival and hazard 
functions 


Si{t) = exp~AT' 


hi{t) = 


ki 




( 3 ) 

( 4 ) 

(5) 


where t > 0 is the average event happening time to node i, 
Xi > 0 and ki > 0 is the scale and shape parameter of the 
Weibull distribution. In the following we will assume the 
network nodes are users and the event is retweeting. 

Likelihood of retweeting dynamics. Supposing there 
are N users in total, Ti is a set of rrii time stamps and each 
element Tij indicates the j-ih retweet time stamp to the 
post of the i-th user. We sort those time stamps out in 
increasing order so that Tij-^i ^ Tij. We assume Tij > 1 
and Ti^rui > 1- Then the likelihood of the event data can be 
written as follows: 















N rrii 

LiX,k) = ■ Si^j)) 
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logL(A, fc) = y^ li{Xi,ki) 


(6) 

(7) 


where li{Xi, h) = mi logki + {ki — 1 ) J2T=i — 

mikilogXi-X-’^' ErJiT;^ 

As discovered in section [3.4 [ the survival characteristics 
of the user is correlated with the behavioral features of 
him/her. Then we can parametrize those parameters in 
the personalized Weibull distributions using those behav¬ 
ioral features. More formally, let be a r dimensional fea¬ 
ture vector for user i, we parameterize Xi and ki with the 
following linear function: 


log Xi = log Xi^ I3 ( 8 ) 

log ki = log Xi * 7 (9) 


where [3 and 7 are r-dimensional parameter vector for A 
and k. We attempt to find the scale and shape parameter 
of every user so that the likelihood of the observed data is 
maximized, at the same time we can also get the parameter 
vectors for out-of-sample extensions. 

We use the Equation ^ and to replace Xi and ki 
in the log likelihood function Equation 0 to solve the pa¬ 
rameters. To further enhance the interpretability, we also 
add il sparsity regularizers on [3 and 7 respectively to en¬ 
force model sparsity. Combining everything together, we 
can obtain the NEtworked WEibull Regression (NEWER) 
formulation which aims to minimize the following objective: 

F(A,fc,/3,7) = Gi(A,fc) +/iG2(/3,A) +r,G3(7,fc) (10) 

Gi(A,fc) =-logL(A,fc) (11) 

G 2 {\P) = ^\\\ogX-\ogX ■ (12) 

G'3(fc,7) = ^l|logfc-logA:- 7 f+Q;^|| 7 ||j (13) 

Optimization. To minimize F(X, k, /3,^) in Equation ( jlOj ), 
we first prove that the function is lower bounded. We have 
the following theorem. 


Theorem 1. E(A,/c,/3,7) has global minimum. 

Proof. See the appendix. □ 

With this theorem, the following coordinate descent strat¬ 
egy can be used to solve the problem with guaranteed con¬ 
vergence. At each iteration, we solve the problem with one 
group of variables with others fixed. 

' For it = 1,.. ..itrnax 

^[ 7 ^+ 1 ] _ ar^mm;^E(A,, 7^”^^^) 

< — argmin^F[')^'^^^^\k^ (14) 

= argmin^F{X ^^^^^^, , 7 ) 

Eor solving the subproblem with respect to A or /c, we use 
Newton’s Method. Eor subproblem with respect to /3 and 7 , 
we use standard LASSO solver p^ . 


4.4 Efficient cascading process prediction 

It should be born in mind that cascading prediction is 
intended to perform early prediction of its size at any later 
time. In the following we will present two models to achieve 
this goal. 

4.4.1 Basic Model 


The entire flow of the basic model we proposed is illus¬ 
trated in Algorithmic 
Algorithm 1 Basic Model 
Input: 

Set of users U involved in the cascade C before time tn^a, 
survival functions of users Suj (t), predicting time tel 

Output: 

Size of cascade size {Ct^)] 

1: for all user Ui ^ U do 

2: creates a subcascade process with replynum{ui) = 0 

3: if Ui is not root node then 

4: reply num(rp(ui)) = replynum{rpiui)) + 1 

5: end if 

6: end for 

7: sum = 1 

8: for all user Ui ^ U do 

9: deathrate{ui) = max (^1 - (bimzt - t{ui)), 

10: fdrate{ui) = max (^1 — (te - t{ui)), 

11: sum = sum + 

12: end for 

13: return size {Cte) = sum 


replynum{ui)- fdrate{ui) 
deathrate{ui) 


When a new node Ui is added into the cascade at t{ui), 
the algorithm will launch a process to estimate the final size 
of the subcascade that Ui will generate, with temporal size 
counter replynum(ui) and survival function Smit) starting 
at t(ui). If Ui is involved by others, the algorithm also in¬ 
creases the temporal size of the retweet set of its parent 
rp(ui) by one. 

After all the information before the deadline is collected, 
the result will be finalized by aggregating all the value esti¬ 
mated by every subcascade process. Since the post number 
is at most \V\ (all nodes in the network are involved into 
the cascade), the value of death rate deathrate{ui) and final 
death rate fdrate{ui) (complement to their survival rates) 


at line ^ and line 

i/VI- 
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is set to be 1 /|E| when it is lower than 


Complexity Analysis. Only constant time operations is 
involved in the two for-loops. Therefore, the complexity of 
the algorithm is 0 (n) where n is the number of users in the 
cascade. 


4.4.2 Sampling Model 

Although the basic model solves the estimation problem, 
real applications often need to estimate the cascade size dy¬ 
namically so that the changes can be monitored. 

To make the algorithm scalable, the number of recalcu¬ 
lations should be limited, while the estimated value of size 
should fall into an acceptable error scope. We can utilize 
the following two facts to make the estimation process more 
efficient: (1) Eor a subcascade generated by the esti¬ 
mation of the size will always be zero if there is no user 
involved into it, which means we can ignore the calculation. 
(2) If we do not re-estimate the final number of a subcascade 
(when there is no new user involved into it), the temporal 
size counter replynum(ui) and final death rate edrate{ui) 
will not change but the death rate deathratemd) will in¬ 
crease over time. Supposing the previous time stamp of the 










subcascade set estimation is to, it will cause a relative error 
rate of (to) ~ 1 ^t ti. Heuco, the relative error 

rate will be at most e if we re-estimate the final number of 
the subcascade at — (1 + e) • (deat/irate^. (to))). By 

exploring those two tricks, we propose a sampling model 
shown in Algorithm 


Algorithm 2 Sampling Model 
Input: 

survival functions of users Suj{t), and set of users U in one 
cascade C(given dynamically); 

Output: 

for every size prediction request to te at to, output size 


algorithm into an online environment, the complexity will be 
0{T + N /o^i+e(|y|)) ~ 0{T) for all the cascades with N 
Users in totalj(we see /o^i+e(|U|) as a constant with respect 
to T and A ~ T as the number of users involved in cascades 
increases over time). □ 

With this model, for cascade hnal size prediction, we just 
need to set the prediction time te to be inhnite so that the 
deathrate of all subcascades will be 1. For outbreak time 
prediction, we can make a binary search with respect to 
time te, checking whether the cascade size will be more or 
less than the size number at tmid and make the decision 
eachtime. 
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sum = 0; 

while request = model, accept Request do 
switch (request.type) 

case APPROXIMATION: 

return size{Cte) — sum 
case INVOLVED_USER: 

uj^request.user, to=request.time 
creates a subcascade process: 

t{ui) = to, app(ui) = 0, replynumiui) = 0, 

fdrate{ui) = max , 1 - (te - to)); 

if Ui is root node then 
sum = 1; 
else 


trep =to -t {rp{ui))] 

replynumirpiui)) = replynumirpiui)) + 1; 
sum = sum — app{rp{ui)); 

deathrate{rp{ui)) = max 1 “ ^rp{ui)id'rep. 
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appirpiu,)) = 

sum = sum + app(rp{ui)); 

tnew = S'^)^,)(l - (1 + e) • deathrate{rp{ui))) 
-\-t{rp{ui))] 

sendRequest (THRESHOLD_CH ANGE,rp(n j) dnew ); 

end if 

case THRESHOLD_CHANGE: 

Ui = request.user, to=request.time 
sum = sum — app(ui); 

deathrate(ui) = max 1 — 8'^^ (to — t(nj))); 

tnew = 8F/(1 - (1 + e) • deathrate{ui)) + t{ui)] 
sendRequest (THRESHOLD_GH AN GE,ni dnew ); 

— replynum{ui)-fdrate{ui) . 

deathrate{ui) ’ 

sum = sum + app(ui)] 

end switch 
end while 


Complexity Analysis. The following theorem analyzes 
the complexity of Algorithm 2. 

Theorem 2. With an overall 0(n /o^i+e(|U|)) counting 
to estimate the number of subcascades, the sampling model 
can approximate the final size of the whole cascade at any 
time with an relative error rate of at most e. 

Proof. For each approximation request, we only need 
to report the number directly; for every new subcascade, 
the initially operation number is also constant, and we need 
to do at most 0(/o^i+e(|U|)) times threshold adjustment 
for subcascade which has users involved in, since the lower- 
bound of deathrate is and the upperbound is l(all the 
people are involved in the cascade). Above all, the hnal com¬ 
plexity is 0(t)+0(n)+0(n/o^i+e(|U|)) = 0{t-\-n logi-^e{\V\)) 
for each cascade (with n users and t requests). If we put this 


5. EXPERIMENTS 

In order to evaluate the performances and fully demon¬ 
strate the advantages of the proposed method, we conduct 
a series of experiments on the dataset introduced in Section 
|3.1| The results of multiple tasks are reported, including 
cascade size prediction, outbreak time prediction and cas¬ 
cading process prediction. Also, 

5.1 Baselines and Evaluation Metrics 

Since we are the hrst to investigate cascading process pre¬ 
diction problem, no previous models can be adopted as di¬ 
rect baselines. Here, we implemented the following methods 
which can be potentially applied into our targeted problem 
as baselines: 

• Cox Proportional Hazard Regression Model (Cox) : 
This model assumes that the behavioral dynamics of 
all users have different scale parameters while sharing 
the same shape parameter. We use the same covariates 
as in our model and hnd the optimal scale parameters 
for all users and the shared shape parameter. We im¬ 
plement it as in [^. 

• Exponential/Rayleigh Proportional Hazard Regression 
Model (Exponential/Rayleigh): Since the shape pa¬ 
rameters of both Exponential and Rayleigh distribu¬ 
tions are fixed values (1 for Exponential distribution 
and 2 for Rayleigh distribution), they are two special 
cases of Cox model. 

• Log-linear Regression Model (Log-linear): We refer to 

which extracted 4 classes features to characterize 
cascades, including node features, structural features 
of cascades, temporal features and content features. 
In our case, we ignore the content features which are 
not covered in our dataset and also reported by to 
be unimportant for cascade prediction. Then we use 
log-linear regression model to predict the cascade size. 

It is noted that Log-linear can only predict cascade size 
but not for time-related prediction, while Cox, Exponen¬ 
tial and Rayleigh models are applied to all prediction tasks. 
Also, the goal of Cox, Exponential and Rayleigh models are 
to elucidate the behavioral dynamics. After that, we use the 
same cascade prediction model as in our method to conduct 
cascade-level predictions. 

Eor each cascade, our dataset includes its complete cascad¬ 
ing process as the groundtruth. Next, we use the following 
metrics to evaluate the performances: 

^It will be counted multiple times if a specihc user involves 
in multiple cascades 













• Root Mean Square Log Error (RMSLE): In Power- 
Law distributed data, it is not reasonable to use stan¬ 
dard RMSE to evaluate the prediction accuracy. Eor 
example, for a cascade with the groundtruth size of 
1000 , it is significantly different to predict its size to 
be 2000 or 0, but they have the same RMSE. Thus, 
we first calculate the logarithmic results for both the 
groundtruth and predicted value, then calculate RMSE 
on the logarithmic results to evaluate the accuracy of 
the proposed method and baselines. 

• Precision with cr-Tolerance ((5cr-Precision): In real ap¬ 
plications, a small deviation from the groundtruth value 
is often acceptable. In our case, we regard the pre¬ 
dicted value within the range of groundtruth(l ± a) 
as a correct prediction, and the resulted precision is 
(5cr-Precision. 

Eor parameter setting, there are 4 parameters in our method, 
including /x, 77 , ap and a^. We tune these parameters by grid 
searching, and the optimal parameters used in our experi¬ 
ments are /x = 10,77 = 10 , 0^/3 = 6 * 10 “^, = 8 * 10 “®. 

5.2 Cascade Size Prediction 
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Figure 7: RMSLE results of different methods with 
different number of observed nodes in cascades. 

We randomly separate the cascades into 10 folds, and con¬ 
duct a 10-fold cross validation by using 9 of them as training 
data and the other one as testing data. Eor cascades with 
size over k, we use the first s(s < k) nodes as observed data, 
and the target is to predict the final cascade sizes. 

The prediction performances of all the methods are shown 
in Eigurej^ It can be seen that the proposed method NEWER 
significantly outperforms other baselines in RMSLE value 
in different sized datasets. The baselines that has the clos¬ 
est performance with NEWER is the Cox model. We can 
see that the margins of improvement from Cox to NEWER 
are more obvious in the dataset with larger k. In a certain 
dataset, the margins are more evident with smaller s. These 
results demonstrate the significant advantage of NEWER in 
predicting large cascades in very early stage. 

Comparatively, the Log-linear method does not achieve 
satisfactory results in this task. The main reason is that 
the coefficients in the Log-linear model are highly biased to¬ 
wards the dominant number of small-sized cascades, which is 
also argued by [^. In our method, we successfully overcome 
this bias by shifting from macro cascade level features to mi¬ 
cro behavioral dynamics. The substantial gain achieved by 
all behavioral dynamics based methods (including NEWER, 
Cox, Exponential and Rayleigh) exemplifies the importance 
of this micro mechanism for cascade prediction. 

In order to demonstrate the efficiency of the proposed 
method, we also evaluate the computational cost of NEWER 
and Sampling-NEWER in the computational environment 
with 3.4GHZ Quad Core Intel 17-3770 and 16GB memory. 
We track the process of all cascades. The base cascade pre¬ 
diction model (Base) re-predicts the final size at every time 


Method 

Base 

Model 

Improved 
Model {5 = 0.1) 

Directed 

Learning Method 

Size > 20 

8.47* lO'^s 

10.73s 

899s 

Size > 50 

7.61 * lO'^s 

8.62s 

899s 

Size > 100 

6.65 * lO^s 

7.09s 

898s 

Size > 500 

4.35 * lO'^s 

4.33s 

891s 

Size > 1000 

3.4* lO^s 

3.30s 

881s 


Table 3: Running time for different methods in dif¬ 
ferent dataset under a server with 3.4GHZ Quad 
Core Intel i7-3770 CPU and 16CB memory. 


points (in second), while the sampling-based cascade predic¬ 
tion model (Sampling) re-predicts the final size only when 
the observed cascade sizes increase. As shown in Table 
the Sampling model (with a 10 percent performance degra¬ 
dation tolerance) is much more efficient than Base model by 
almost 5 magnitudes. According to Section [T^ it is guaran¬ 
teed that the Sampling method can also improve with sim¬ 
ilar magnitudes than the Base model in cascading process 
prediction task. So we omit these results for brevity. 

5.3 Outbreak Time Prediction 

Another interesting problem is to predict when a cascad¬ 
ing outbreak will happen. Eor example, in the early stage of 
a cascade, can we predict when the cascade reaches a spe¬ 
cific size? Without loss of generality, we set the outbreak 
size threshold to be 1000. We evaluate the prediction per¬ 
formance with different number of observed nodes in the 
cascades. As shown in Eigure the NEWER model get 
the best performances in both RMSLE and ( 5 ( 7 -Precision 
metrics. Although Exponential and Rayleigh models report 
better results than NEWER in very early stage (less than 50 
observation nodes), the improvements of their performances 
with increasing number of observed nodes are not as signif¬ 
icant as NEWER. 



Figure 8: Outbreak time prediction results of dif¬ 
ferent methods with different number of observed 
nodes in cascades. 

5.4 Cascading Process Prediction 

The ultimate purpose of this paper is to predict the cas¬ 
cading process. For each cascade, we use St to represent the 
early stage window and t to represent its ending time. Then 
we use the cascade information during [0, St] to predict the 
cascading process during [St,t\. At any time t G [St,t\, we 
calculate whether the predicted cascade size at t is within the 
<7 tolerance of the groundtruth size at t. Then we calculate 
the ( 5 ( 7 -Precision by integrating t to describe the prediction 
accuracy for this cascading process. Finally, we average the 
( 5 ( 7 -Precision for all cascades and show the results in Fig¬ 
ure Here, we vary the early stage percentage (i.e. St/i) 
from 0 to 50%, and discover that in all the settings of early 





















stage percentage, NEWER always carries out the best per¬ 
formances in cascading process prediction. More over, the 
advantage of NEWER is more clear in smaller early stage 
percentage. When we set the early stage to be 15% of the 
whole cascade duration, we can get the (50.2-Precision of 
0.849. That means that we can correctly predict the cas¬ 
cade sizes at 84.9% time points, which indicates that the 
cascading process is predictable and the proposed method is 
adequate and superior in cascading process prediction. Eur- 
thermore, changing the precision tolerance value a will not 
affect the relative results of all the methods in our experi¬ 
ments, and the precision value will be smaller when setting 
<7 smaller. Eor abbreviation, we only report the results of 
(7 = 0 . 2 , which is a reasonable tolerance in most application 
scenarios. 



Early_Stage_Percentage 


Figure 9: Cascading process prediction accuracy of 
different methods under different early stage per¬ 
centage settings. 

5.5 Out-of-sample Prediction 

In real applications, the interaction information between 
nodes is not always available, which makes some nodes’ be¬ 
havioral dynamics cannot be directly derived by maximum 
likelihood estimation from data. We call these nodes as out- 
of-sample nodes. This is the main reason why we propose 
NEWER to incorporate the covariates of behavioral dynam¬ 
ics. In order to evaluate the performance of NEWER in 
handling this case, we simulate the scenario by hiding the 
interaction information of randomly selected 10 % users as 
out-of-sample users, and then predict the final sizes of the 
cascades that these users involved in early stages. 

-NEWER — wbl cox 


Cascades with size at least 300 Cascades with size at least 600 Cascades with size at least 1000 
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Figure 10: Prediction result by unknown users 

In Cox model, the scale parameters in behavioral dynam¬ 
ics of out-of-sample users can be regressed by the covariates. 
For the shape parameter, we calculate the average value of 
shape parameters in observed users and apply this value to 
the shape parameters of out-of-sample users. In NEWER 
model, both of shape and scale parameters can be regressed 
by covariates with the learned /3 and 7 . We also employ the 
standard Weibull Regression (Wbl) as a basline, which can 
be derived by simply setting fi and 77 to be 0 in Equation 
Then we use the averaged shape and scale parameters 
of observed users as the parameters of out-of-sample users. 


As shown in Figure the NEWER model can signifi¬ 
cantly and consistently outperform Cox and Wbl models in 
out-of-sample prediction, which demonstrates that the dis¬ 
covered covariates from behavioral features of a user’s net¬ 
worked neighbors can effectively predict the user’s behav¬ 
ioral dynamics. Also, we visualize the regression coefficients 
(3 and 7 in Figure m It can be observed that the behav¬ 
ioral features of a user’s followers plays more important roles 
in predicting both scale and shape parameters for the user, 
while the user’s structural features are less important. 

scale shape 




Figure 11: Parameter coefficients. 

6. CONCLUSIONS 

In this paper, we raise an important and interesting ques¬ 
tion: beyond predicting the final size of a cascade, can we 
predict the whole cascading process if the early stage in¬ 
formation of cascades is given? In order to address this 
problem, we propose to uncover and predict the macro cas¬ 
cading process with micro behavioral dynamics. Through 
data-driven analysis, we find out the common principles and 
important patterns laying in behavioral dynamics, and pro¬ 
pose a novel NEWER model for behavioral dynamics mod¬ 
eling with good interpretability and generality. After that, 
we propose a scalable method to aggregate micro behavioral 
dynamics into macro cascading processes. Extensive exper¬ 
iments on a large scale real data set demonstrate that the 
proposed method achieves the best results in various cas¬ 
cading prediction tasks, including cascade size prediction, 
outbreak time prediction and cascading process prediction. 

Appendix: Proof of Theorem 1 


Proof, it’s evident that both G2 (/3, A) and G3 ( 7 , k) has global 
minimum value. Next we prove that Gi (A, k) also has global min¬ 
imum value, or to prove logL(A, k) has global maximum value. 

Let A' = logL'(A',fc) = logL(A,fc) = 

where l[{\[,ki) = mi log + {ki - I) + mi log A' - 

A' d^i ]: partial derivatives of the are given by: 
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Since < 0 and < 0 , the conditional marginal posterior 

densities of parameters A' and ki are log-concave. Moreover, when 



















which means there should be a global maximum of /J, so does 
logL. □ 

[15] 
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